199 research outputs found
Parallel universes to improve the diagnosis of cardiac arrhythmias
We are interested in using parallel universes to learn interpretable
models that can be subsequently used to automatically diagnose
cardiac arrythmias. In our study, parallel universes are
heterogeneous sources such as electrocardiograms, blood pressure
measurements, phonocardiograms etc. that give relevant information
about the cardiac state of a patient. To learn interpretable rules,
we use an inductive logic programming (ILP) method on a symbolic
version of our data. Aggregating the symbolic data coming from all
the sources before learning, increases both the number of possible
relations that can be learned and the richness of the language. We
propose a two-step strategy to deal with these dimensionality
problems when using ILP. First, rules are learned independently in
each universe. Second, the learned rules are used to bias a new
learning process from the aggregated data. The results show that
this method is much more efficient than learning directly from the
aggregated data. Furthermore the good accuracy results confirm the
benefits of using multiple sources when trying to improve the
diagnosis of cardiac arrythmias
Constraint-based Subspace Clustering
International audienceIn high dimensional data, the general performance of traditional clustering algorithms decreases. This is partly because the similarity criterion used by these algorithms becomes inadequate in high dimensional space. Another reason is that some dimensions are likely to be irrelevant or contain noisy data, thus hiding a possible clustering. To overcome these problems, subspace clustering techniques, which can automatically find clusters in relevant subsets of dimensions, have been developed. However, due to the huge number of subspaces to consider, these techniques often lack efficiency. In this paper we propose to extend the framework of bottom up subspace clustering algorithms by integrating background knowledge and, in particular, instance-level constraints to speed up the enumeration of subspaces. We show how this new framework can be applied to both density and distance based bottom-up subspace clustering techniques. Our experiments on real datasets show that instance-level constraints cannot only increase the efficiency of the clustering process but also the accuracy of the resultant clustering
Mining Mid-level Features for Image Classification
International audienceMid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In par- ticular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC
On the benefits of self-taught learning for brain decoding
We study the benefits of using a large public neuroimaging database composed
of fMRI statistic maps, in a self-taught learning framework, for improving
brain decoding on new tasks. First, we leverage the NeuroVault database to
train, on a selection of relevant statistic maps, a convolutional autoencoder
to reconstruct these maps. Then, we use this trained encoder to initialize a
supervised convolutional neural network to classify tasks or cognitive
processes of unseen statistic maps from large collections of the NeuroVault
database. We show that such a self-taught learning process always improves the
performance of the classifiers but the magnitude of the benefits strongly
depends on the number of data available both for pre-training and finetuning
the models and on the complexity of the targeted downstream task
Learning rules from multisource data for cardiac monitoring
International audienceThis paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours from different viewpoints. To learn interpretable rules, we use an Inductive Logic Programming (ILP) method. We develop an original strategy to cope with the dimensionality issues caused by using this ILP technique on a rich multisource language. The results show that our method greatly improves the feasibility and the efficiency of the process while staying accurate. They also confirm the benefits of using multiple sources to improve the diagnosis of cardiac arrhythmias
UniRank: Unimodal Bandit Algorithm for Online Ranking
We tackle a new emerging problem, which is finding an optimal monopartite
matching in a weighted graph. The semi-bandit version, where a full matching is
sampled at each iteration, has been addressed by \cite{ADMA}, creating an
algorithm with an expected regret matching
with players, iterations and a minimum reward gap . We reduce
this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the
unimodality property of the expected reward on the appropriate graph to design
an algorithm with a regret in . Secondly, we show
that by moving the focus towards the main question `\emph{Is user better
than user ?}' this regret becomes
, where \Tilde{\Delta} > \Delta
derives from a better way of comparing users. Some experimental results finally
show these theoretical results are corroborated in practice
CAWET: Context-Aware Worst-Case Execution Time Estimation Using Transformers
This paper presents CAWET, a hybrid worst-case program timing estimation technique. CAWET identifies the longest execution path using static techniques, whereas the worst-case execution time (WCET) of basic blocks is predicted using an advanced language processing technique called Transformer-XL. By employing Transformers-XL in CAWET, the execution context formed by previously executed basic blocks is taken into account, allowing for consideration of the micro-architecture of the processor pipeline without explicit modeling. Through a series of experiments on the TacleBench benchmarks, using different target processors (Arm Cortex M4, M7, and A53), our method is demonstrated to never underestimate WCETs and is shown to be less pessimistic than its competitors
Recherche efficace de motifs fréquents dans des grilles
National audienceGeneral-purpose exhaustive graph mining algorithms are seldom used in real life contexts due to the high complexity of the process mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we show how to exploit grid-based representations to efficiently extract frequent grid subgraphs, and we introduce an efficient grid mining algorithm called GRIMA designed to scale to large amount of data. We apply our algorithm on image classification problems. Experiments show that our algorithm is efficient and that adding the structure may help the image classification process.La complexité des algorithmes de fouille de graphes généraux est telle qu'ils sont peu utilisés en pratique. Cette complexité est due à la fois aux tests d'isomor-phisme et au grand nombre de combinaisons permettant d'étendre un graphe durant le processus de fouille. Dans cet article, nous proposons d'exploiter des représenta-tions géométriques régulières (des grilles) pour recher-cher efficacement des motifs fréquents dans un ensemble de grilles. Nous présentons un algorithme appelé GRIMA qui, contrairement aux algorithmes généraux, peut passer l'échelle. Nous appliquons cet algorithme à un problème de classification d'images, pour lesquelles nous proposons une représentation par Sac de grilles. Les expérimenta-tions montrent l'efficacité de notre algorithme et l'intérêt d'utiliser une représentation structurée pour représenter les images
GriMa: a Grid Mining Algorithm for Bag-of-Grid-Based Classification
International audienceGeneral-purpose exhaustive graph mining algorithms have seldom been used in real life contexts due to the high complexity of the process that is mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we explain how to exploit grid-based representations of problems to efficiently extract frequent grid subgraphs and create Bag-of-Grids which can be used as new features for classification purposes. We provide an efficient grid mining algorithm called GriMA which is designed to scale to large amount of data. We apply our algorithm on image classification problems where typical Bag-of-Visual-Words-based techniques are used. However, those techniques make use of limited spatial information in the image which could be beneficial to obtain more discriminative features. Experiments on different datasets show that our algorithm is efficient and that adding the structure may greatly help the image classification process
- …